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Most prokaryotes contain CRISPR-Cas immune systems that provide protection against mobile genetic elements. We 
have focused on the ability of CRISPR-Cas to block plasmid conjugation, and analyzed the position of target sequences 
(protospacers) on conjugative plasmids. The analysis reveals that protospacers are non-uniformly distributed over 
plasmid regions in a pattern that is determined by the plasmid's mobilization type (MOB). While MOB p plasmids are most 
frequently targeted in the region entering the recipient cell last (lagging region), MOB F plasmids are mostly targeted in 
the region entering the recipient cell first (leading region). To explain this protospacer distribution bias, we propose two 
mutually non-exclusive hypotheses: (1) spacers are acquired more frequently from either the leading or lagging region 
depending on the MOB type (2) CRISPR-interference is more efficient when spacers target these preferred regions. To 
test the latter hypothesis, we analyzed Type l-E CRISPR-interference against MOB F prototype plasmid F in Escherichia coli. 
Our results show that plasmid conjugation is effectively inhibited, but the level of immunity is not affected by targeting 
the plasmid in the leading or lagging region. Moreover, CRISPR-immunity levels do not depend on whetherthe incoming 
single-stranded plasmid DNA, or the DNA strand synthesized in the recipient is targeted. Our findings indicate that 
single-stranded DNA may not be a target for Type l-E CRISPR-Cas systems, and suggest that the protospacer distribution 
bias might be due to spacer acquisition preferences. 



Introduction 

Bacterial and archaeal genomes have been shaped to a considerable 
extent by events of horizontal gene transfer (HGT). 1,2 The three 
main routes of HGT are transformation (DNA uptake from the 
environment), transduction (virus or phage mediated DNA trans- 
fer) and conjugation (plasmid transfer through mating between 
self-transmissible plasmid containing donor cells and plasmid free 
recipient cells) ? Whether or not the invasion provides a fitness gain 
to the host depends on the nature of the incoming DNA 4,5 and on 
the genetic background of the host. 6 " 5 In addition, the impact of 
a particular HGT event may depend on environmental param- 
eters, 4,10 " 12 such as the presence or absence of antibiotics, toxic 
metal ions and nutrients. Upon a change in environment, previ- 
ously beneficial DNA may no longer provide a selective advantage, 
and instead can reduce host fitness. 13 In addition to DNA uptake 
mechanisms, bacteria therefore require systems that either block 
the entry of alien DNA or can remove such DNA from the cell. 
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CRISPR-Cas (clustered regularly interspaced short palin- 
dromic repeats/CRISPR-associated) is a widespread prokaryotic 
adaptive and heritable immune system that specifically degrades 
non-self DNA from cells. 14 Three main types of CRISPR-Cas 
systems are currently recognized, 15 and these show many strik- 
ing mechanistic and structural dissimilarities (reviewed in refs. 
16 and 17). A universal property of CRISPR-Cas systems is 
that they make use of a genomic CRISPR locus for integration 
of short sequences derived from invader genomes. The invader 
sequences (spacers) in a CRISPR array are typically 30 nt each 
and are separated from each other by host-derived repeat- 
ing sequences of approximately the same size. The acquisition 
of new spacer sequences during the CRISPR-adaptation stage 
provides resistance against genetic elements containing cognate 
sequences. 18 " 21 During the expression and interference stages, the 
CRISPR is transcribed into precursor CRISPR RNA, which is 
subsequently cleaved in the repeat sequences by a Cas endori- 
bonuclease in Type-I and Type-Ill systems, and by RNaselll in 
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Type-II systems. 22 " 25 The resulting processed crRNA is further 
trimmed from the 3' or 5' end in some cases, 25 " 27 to yield mature 
crRNA species. Cas proteins utilize mature crRNAs to bind and 
cleave nucleic acids containing a complementary sequence. 28 " 35 

CRISPR systems appear to be active against all forms of 
invading DNA. A high number of studies have demonstrated 
that CRISPR systems can provide adaptive immunity against 
phage infections under laboratory conditions. 18,22,36 " 42 In addition, 
CRISPR-Cas systems can mediate plasmid curing and resistance 
against plasmid transformation. 19,30,42 " 45 A recent paper indicates 
that CRISPR systems can be active against conjugative transpo- 
sons, 46 and two studies reported on CRISPR-mediated resistance 
against conjugative plasmids. 45,47 In agreement with this, analyses 
of the CRISPR content of bacteria from environmental samples 
recovered spacer sequences that match a variety of known viral 
and plasmid sequences, 48 " 54 or can even be used for identifying 
new mobile genetic elements. 55 

Whereas CRISPR-mediated protection against virulent 
phages provides a clear selective advantage to the host cell, the 
outcome of targeting conjugative plasmids is expected to be cru- 
cially dependent on environmental parameters and accessory 
genes encoded by the plasmid. Excess baggage theory predicts 
that when a plasmid does not encode proteins that provide a 
selective advantage, the presence of the plasmid results in a fit- 
ness cost for the host. 13 A number of experimental studies have 
indeed reported a fitness cost for the host associated with plasmid 
carriage. 6,9,10,56 " 59 However, some studies have reported examples 
where plasmid carriage by a bacterial strain in the absence of 
selective pressure on plasmid maintenance does not incur a fit- 
ness cost to the host, or can even provide a fitness gain under 
laboratory growth conditions. 9,60,61 It appears likely, however, that 
a host's fitness cost or benefit for carrying a plasmid is strongly 
dependent on the environmental conditions (e.g., nutrient avail- 
ability, temperature, presence of other mobile DNA elements, 
etc.). For example, a number of phages use pili encoded by con- 
jugative plasmids as receptors for adsorption to the host cell (e.g., 
refs. 62 and 63). In contrast, certain plasmid-encoded toxin- 
antitoxin loci provide both plasmid stability and phage-resistance 
phenotypes, and are maintained in either the presence or absence 
of viruses. 64 Hence, depending on the conditions, plasmid loss 
can provide a fitness gain to the host cell. Plasmid fitness on the 
other hand is determined by its ability to spread both vertically 
(dependent on copy number, 65 stable partitioning [par) genes 66 
and multimer resolution systems 67 ) and horizontally [dependent 
on plasmid mobility genes and plasmid size (transduction)] to 
new hosts. Furthermore, many plasmids carry genes encoding 
addiction systems 68,69 to avoid segregational loss (reviewed in 
ref. 70). 

Here, we elaborate on the role of CRISPR-Cas systems in 
targeting conjugative plasmids. By analyzing all spacers that 
show complementarity toward plasmids containing an origin 
of transfer (oriT, the site that allows for plasmid transfer via 
conjugation), we demonstrate that protospacers are distributed 
non-randomly over these conjugative plasmids. While MOB p 
plasmids are most frequently targeted in the lagging region (the 
plasmid region entering the recipient cell last), MOB f plasmids 



are mostly targeted in the leading region (the plasmid region 
entering the recipient cell first). Next, we performed an in-depth 
analysis of spacers targeting conjugative plasmids belonging to 
the MOB F family, which is one of the best-studied conjugative 
plasmid families. By studying the CRISPR-mediated targeting 
of conjugative plasmid F by Escherichia coli, we experimentally 
show that the level of CRISPR interference appears to be inde- 
pendent of the target location on the plasmid (leading region 
vs. lagging region). In addition, CRISPR-immunity levels are 
similar for recipient cells targeting the incoming single-stranded 
plasmid DNA, as for recipient cells targeting the DNA strand 
that is newly synthesized in the recipient. Our findings indi- 
cate that single-stranded DNA may not be a target for Type- 
I-E CRISPR-Cas systems. Furthermore, these data suggest that 
either spacers are derived more frequently from the leading 
region of MOB f plasmids during CRISPR adaptation (possibly 
due to interrupted mating) or that, for unknown reasons, cells 
targeting the leading region have a higher Darwinian fitness as 
compared with cells targeting the lagging region of MOB F con- 
jugative plasmids. 

Results 

CRISPR systems are biased toward targeting specific regions 
of conjugative plasmids. Conjugative plasmid transfer is a mul- 
tistep process (reviewed in refs. 71 and 72) that requires an oriT 
and a set of transfer proteins encoded by the transfer region of 
the plasmid. Contact between a plasmid-encoded pilus of a donor 
cell and the cell surface of a recipient cell leads to a mating signal, 
pilus retraction and conjugative pore formation. 73 Next, an oriT 
relaxosome complex is formed that causes nicking of one strand 
of the oriT. The relaxosome interacts via the coupling protein 
with the Type IV secretion system. 74 Starting with the 5' end 
(leading region), the nicked strand of the plasmid DNA is then 
transferred from the donor to the recipient cell. After the leading 
region enters the recipient cell first, conjugation generally pro- 
ceeds until the entire plasmid is transferred. Mating is completed 
when the plasmid DNA in the recipient cell is re-circularized and 
the complementary DNA strand is synthesized. 

The well-conserved directionality of conjugative plasmid 
transfer may have implications for target selection/recognition 
during CRISPR adaptation and interference, since the timing of 
DNA entry is different for sequences within the leading region, 
which enter the cell first, than for sequences within the lagging 
region, which enter the recipient cell last. To investigate this, we 
used an in silico approach to check for a correlation between the 
location of protospacers (leading vs. lagging regions) and the fre- 
quency they are being acquired as a spacer by a CRISPR system. 
Slightly more than 70,000 spacer sequences (of which roughly 
65,000 were unique) were obtained from the CRISPRdb 14 and 
were locally blasted against 3,167 plasmid sequences taken from 
GenBank. After removing duplicate spacers, self-targeting spac- 
ers (e.g., when a CRISPR-locus is located on a megaplasmid) , 
as well as spacers having multiple hits on the same target plas- 
mid, we found that 30% of this subset of spacers in the data- 
base match known plasmid sequences. Only protospacers from 
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plasmid sequences containing an annotated orz'TTeature (2.3% 
of all plasmids in the database) were selected for further analysis. 
To establish the direction of conjugation (and thereby identifying 
the leading and lagging regions), the orzT-containing plasmids 
were screened for the presence of relaxase genes, which are usu- 
ally located in the lagging regions of the plasmid, in close prox- 
imity of the oriT. 75,76 Although the relaxase genes are not located 
at a fixed distance from the oriT, relaxase genes can be used as a 
marker gene for allocating lagging regions of a plasmid. 

This analysis revealed that 375 unique spacers target 39 dif- 
ferent conjugative plasmids, yielding a total number of 506 pro- 
tospacer sequences (Table 1), indicating that many spacers match 
multiple protospacers located on different, possibly related, plas- 
mids. From this data set, the shortest distance from each proto- 
spacer to the oriTwas calculated and expressed as a percentage of 
the total plasmid size (Fig. 1A). Since the oriT marks the bound- 
ary between leading and lagging regions of the plasmid, distance- 
scores smaller than 50% are indicative of spacers targeting the 
leading regions, while distance-scores larger than 50% represent 
spacers targeting the lagging regions. 

To analyze whether the distribution of protospacers on these 
plasmids was random, we performed a statistical analysis using 
the Kolmogorov-Smirnov test. This test revealed a statistically 
significant difference between the observed protospacer distribu- 
tion and a uniform protospacer distribution (p = 0.044). The 
majority of spacers were found to target the lagging regions of 
these conjugative plasmids (Fig. 1A). Apart from this, we also 
observed a slight bias toward protospacer occurrence in the lead- 
ing region at distances between 10 and 20% away from the oriT. 
Moreover, a significant clustering of protospacers (p < 0.05) was 
observed for 8 out of 36 plasmids, as determined by compari- 
sons of the circular distributions of protospacers per plasmid to 
uniform distributions using Kuiper's tests. If no clustering per 
plasmid would occur at all, the expected number of significant 
clustering (at p < 0.05) would be 0.05*36 = 1.8. In that case, the 
probability to find eight or more would be 0.00005. These analy- 
ses show that protospacers display significant clustering, and that 
they are slightly more often located within the lagging regions of 
conjugative plasmids. 

The protospacer distribution bias is MOB family-dependent. 
Conjugative plasmids can be grouped into six different MOB 
families, based on the six different families of relaxase genes. 75,77,78 
It is envisaged that differences exist between the different MOB 
family plasmids with respect to the molecular mechanism and 
kinetics of plasmid transfer. The observed bias toward target- 
ing of the lagging region of conjugative plasmids may therefore 
be different for different MOB families. To investigate this, the 
data shown in Figure 1A were re-analyzed for each MOB family 
independently (Fig. SI). While two families were not represented 
in this data set due to the lack of annotated oriT (MOB H and 
MOB ( ,), the results show that the targeting of lagging regions is 
most evident in the MOB p family (n = 351). The MOB f family (n 
= 42) however, shows a clear bias for targeting the leading regions. 

To extend this analysis to conjugative plasmids lacking an 
annotated oriT, a complementary approach was performed using 
annotated relaxase genes. 75 This analysis is warranted by the fact 



Table 1. Specifications from the bioinformatics analysis of spacers 



targeting conjugative plasmids 



Database 


Total spacers 3 


72,431 


- unique 


65,574 


Total plasmids b 


3,167 


- annotated oriT and relaxase 


48 


- annotated relaxase 


127 


Conjugative plasmids with annotated oriT and relaxase (Fig. 1A) 


BLAST hits 


506 


Unique spacers 


375 


Unique plasmids 


39 


Conjugative MOB F plasmids with annotated relaxase (Fig. 1B) 


BLAST hits 


1,213 


Unique spacers 


815 


Unique plasmids 


70 


"CRISPRdb (crispr.u-psud.fr/crispr). b GenBank plasm 
ncbi.nlm.nih.gov/genomes/Plasmids). 


id database (ftp://ftp. 



that relaxase genes are located in the lagging region of conjuga- 
tive plasmids. 76 The MOB F family is a well-characterized fam- 
ily of conjugative plasmids 77 that is well-suited for this approach 
due to two important characteristics. First, the relaxase genes 
are well annotated (contrary to oriTs), allowing many members 
of this family to be included in our analysis. Second, the direc- 
tion of transcription of the relaxase gene is (in the vast majority 
of cases) oriented away from the oriT, 76 which can therefore be 
exploited as a marker to determine the directionality of the oriT 
and, hence, the transition between leading and lagging regions 
can be predicted. 

In this way, 127 different MOB F -plasmids with known relax- 
ase gene orientations were obtained, and these were used for 
screening the spacer BLAST-hits database. This revealed a total 
number of 1,213 protospacers on 70 different MOB F plasmids, 
resulting from 815 unique spacers (Fig. IB, Table 1). Since the 
exact position of the oriT site could not be determined, the dis- 
tance-scores were calculated as the shortest distance from each 
protospacer to the start of the relaxase gene. 

Checking for overall distribution of spacer hits over the 
MOB F plasmids (analyzing the position relative to the relax- 
ase gene) through the Kolmogorov-Smirnov test, showed a sig- 
nificant deviation from the uniform distribution (p = 0.0025). 
Protospacers are most frequently located approximately -40% of 
the plasmid size away from the relaxase gene (Fig. IB). Although 
the oriT is not taken into account in this analysis, based on the 
previous analysis of MOB F plasmids containing an annotated 
oriT (Fig. SI) it is likely that this region corresponds to the lead- 
ing region of the plasmid. In addition, significant clustering 
(p < 0.05) of protospacers was observed for 17 out of 68 plas- 
mids, as determined by comparisons of the circular distributions 
of spacer hits per plasmid to uniform distributions using Kuiper's 
tests. The frequency of plasmids that show statistically significant 
clustering (17 out of 68) is substantially more than expected by 
chance (p < 0.00001). 
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MOB F plasmids Distance to relaxase gene (% of plasmid size) 



Figure 1. Spacers from CRISPRdb targeting conjugative plasmids. (A) Conjugative plasmids, of which the or/Tsite and the relaxase gene could be iden- 
tified, were screened for homology with spacers from the CRISPRdb. After establishing the leading and lagging regions of the plasmid, by taking into 
account the location of the relaxase relative to the or/Tsite, the distance of each spacer hit from the or/Tsite is expressed as a percentage of the total 
plasmid size. These values are depicted as open circles on the plasmid map (left). The red line indicates the protospacer density at the respective posi- 
tion. The protospacer distribution is also shown in a histogram (right). To this end, the plasmid is divided into 10% segments (i.e., plasmid fragments 
corresponding to 10% of the plasmid size). When equally distributed, each 10% segment would carry 10% of all protospacers.The actual percentages 
of protospacers present in each 10% segment are indicated by the blue bars. (B) A similar analysis as in (A) was performed, but using only the MOB f 
family of conjugative plasmids, and using the relaxase gene start position to calculate distances of the spacer hits. 



CRISPR targeting of conjugative plasmid F predominantly 
occurs within the leading region. To experimentally investigate 
the functional importance of the enriched targeting of MOB F 
conjugative plasmids within the leading region, we selected plas- 
mid F as an exemplary case. The approximately 100 kb conju- 
gative plasmid F (Fig. 2A) was discovered over 60 y ago as a 



sex factor in E. coli K12, 7 ' and has been well-studied over the 
past decades. It encodes the CcdAB toxin/anti-toxin system 
(encoded roughly at position 46.5 k of plasmid F) to prevent 
plasmid loss: the stable CcdB toxin targets the E. coli gyrase in 
the absence of the short-lived CcdA anti-toxin (which therefore 
is absent in plasmid-cured cells) (reviewed in ref. 80). Plasmid F 
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belongs to the MOB F1 subfamily of conjugative plasmids, which 
includes many members belonging to the Inc.F, Inc.N, Inc.W 
and Inc.P9 incompatibility groups. 77 Most plasmids belonging 
to the MOB F] subfamily are hosted by Gamma-Proteobacteria. 77 
The large size of plasmid F, together with the existence of 
F-specific phages, indicate that, depending on the conditions, 
plasmid F can be a fitness cost to the host. In accordance with 
this, in silico analysis reveals that CRISPRdb 14 contains 17 spac- 
ers that match the plasmid F sequence (e-value < 0.05) (Fig. 2D). 
Of these spacers, eight are found in CRISPR loci of E. coli strains 
EDI, LF82, UM146 and 083:H1 NRG857C, whereas the other 
nine spacers are found in CRISPR loci of the following Gamma- 
Proteobacteria: Klebsiella variicola At-22, Klebsiella oxytoca E718, 
Pectobacterium carotovorum PCI (a.k.a. Erwinia carotovora) (all 
belonging to the Enterobacteriacaea) and Cellvibrio japonicus 
Uedal07, and in CRISPR loci of Firmicutes {Caldicellulosiruptor 
bescii DSM 6725, Caldicellulosiruptor lactoaceticus 6A), 
Ignavibacteria (Ignavibacterium album JCM165 11) and Archaea 
{Thermococcus onnurineus NA1 and Desulfurococcus kamchat- 
kensis 1221n). In accordance with its classification in the MOB F 
subfamily, most spacers (12 out of 17) were found in species 
belonging to the class of Gamma-Proteobacteria. F-like plasmids 
have a narrow host range, which is restricted to enteric bacteria 
(e.g., Escherichia, Klebsiella and Erwinia spp). 81 ' 82 The observa- 
tion that non-enteric species contain spacers that give a signifi- 
cant BLAST hit with plasmid F may arise from the possibility 
that these strains host related plasmids, or, alternatively, these 
strains may have experienced invasion by plasmid F, despite the 
fact that they cannot host this plasmid, as described before. 83,84 
In agreement with the previous analysis, the majority (12 out 
of 17) of the protospacers are located within 50% (of the plas- 
mid size) from the oriT (Fig. 2A and B), containing the leading 
region. The majority of protospacers shows clustering approxi- 
mately 40% upstream of the relaxase gene tral (Fig. 2C), similar 
to the general trend that was observed for the entire MOB f family 
(Fig. IB). This further substantiates the finding that MOB p fam- 
ily conjugative plasmids are mainly targeted within the leading 
region. 

The Type I-E CRISPR-Cas system of E. coli K12 provides 
immunity against conjugative plasmid F. Next, we analyzed 
whether the E. coli K12 Type I-E CRISPR system provides pro- 
tection against conjugative invasion by plasmid F. The Type I-E 
system consists of eight cas genes (cas3, csel, cse2, cas7, cas5, cas6e, 
casl and cas2) and CRISPRs with type-2 repeats (Fig. 3A). 85 
Expression of the csel to cas2 operon is repressed by H-NS 86 and 
can be activated by LeuO, 38 or by the BaeSR two-component 
pathway. 44,87 To test for resistance against conjugation, we made 
use of the plasmid F-derived pOX38 -Tc, which corresponds to 
the largest Hindlll fragment of plasmid F, and contains a tetra- 
cycline resistance marker (Fig. 3B). 

Two synthetic CRISPRs were used to test for resistance 
against plasmid invasion. CRISPR-J4 contains four identical 
spacers that target a sequence of phage Lambda, and serves as 
a negative control (Fig. 3C). CRISPR-F contains five different 
spacers that target the pOX38-Tc plasmid. Four of the spacers 
are randomly distributed over the plasmid backbone, whereas 



a fifth spacer targets the tetracycline resistance gene (Fig. 3B 
and D). 

Wild-type E. coli K12 (recipient), transformed with plasmids 
containing either CRISPR-J4 or CRISPR-F, shows conjugation 
efficiencies of -5 x 10" 4 transconjugants/donor with MC4100/ 
pOX38-Tc as donor (Fig. 3E). The lack of CRISPR-dependent 
resistance to plasmid conjugation in E. coli K12 is in agreement 
with the previously reported silencing of cas gene expression in 
this strain by H-NS. 38,86 When E. coli KUAhns transformed 
with CRISPR-F served as a recipient strain, a -50-fold CRISPR- 
dependent reduction in conjugation efficiency was observed as 
compared with the same strain transformed with CRISPR-J4 
(Fig. 3E). Conjugation efficiencies in the KllAhns strains (-2 
x 10" 5 transconjugants/donor) appear to be -100-fold lower than 
in the wild-type cells, which is likely due to growth inhibition 
caused by hns deletion. 88 

When the same conjugation experiments were performed using 
E. coli BL21 (DE3) transformed with plasmids encoding cas genes 
and CRISPR as a recipient strain (and MC4l00/pOX38-Tc as 
donor), high-level CRISPR-interference was observed, leading to 
a 104-fold reduction in conjugation efficiency (as compared with 
recipient cells carrying a non-targeting CRISPR) when cas gene 
and CRISPR expression were not induced, while conjugation was 
entirely abrogated when expression of cas genes and CRISPR in 
the same strain was induced with 1 mM IPTG (Fig. 3F). These 
data demonstrate that E. coli K12 and BL21(DE3) strains are 
efficiently protected against conjugative invasion by plasmid 
pOX38-Tc. 

Resistance levels conferred by Type I-E CRISPR-Cas is 
independent of the DNA strand and the region of plasmid F 
that is being targeted. The experimental system of CRISPR- 
mediated resistance in E. coli YAlAbns against plasmid pOX38- 
Tc that is described above allows for testing hypotheses that may 
explain the observed bias toward preferential targeting of leading 
sequences of MOB F conjugative plasmids under conditions that 
do not make use of overexpression of CRISPR and cas genes. A 
possible reason for the observed bias could be that cells contain- 
ing spacers targeting the leading region of a MOB F conjugative 
plasmid are more resistant to conjugation than cells targeting the 
lagging region. To investigate this, four synthetic CRISPRs were 
designed, two of which contain a single spacer that targets the 
leading region of pOX38-Tc, while the other two CRISPRs tar- 
get the lagging region of pOX38-Tc (Fig. 4A). All four CRISPRs 
target the incoming strand of pOX38-Tc. The CRISPRs target- 
ing the leading sequence were named CRISPR-F-IE1, CRISPR- 
F-IE2 [incoming early 1 and 2 (i.e., targeting the leading region 
of the incoming strand)]. The CRISPRs targeting the lagging 
region were named CRISPR-F-IL1, CRISPR-F-IL2 (incoming 
late 1 and 2) (Fig. 4A). 

Conjugation efficiencies (transconjugants/donor) of pOX38- 
Tc from MC4100 donor cells to E. coli YAlAhns transformed with 
any of these four synthetic CRISPRs are shown in Figure 4C. 
Some spacer sequences are more effective in protecting against 
plasmid transfer than other spacers: CRISPR-F-IE1, CRISPR- 
F-IL1 provide high levels of resistance (-100-fold reduction in 
conjugation efficiencies), whereas CRISPR-F-IE2 provides a 
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15 25 35 45 55 65 75 
Distance to oriT (% of olasmid size) 




25 35 45 55 65 75 

Distance to relaxase (% of plasmid size) 



D 



Cellvibrio iaponicus Uedal07 CRISPR 4 spacer 33 

Query 3 GTGGTGGCCGTTGGCCTGGTGGCGTGGGCA 32 

II II I I I I I I I I I I I I I I I I I III 
Sbjct 98590 GTGCTGACCGTTGGTCTGGTGTCGTGTGCA 98619 

Thermococcus onnurineus NA1 CRISPR 1 spacer 3 

Query 3 AAACATTTCCATCCACCTCCATCTTTTTGC 32 

III I I I I I I I III I I I I I I I I I I I I 
Sbjct 40140 AAAAATTTCCTTGCACATCCATTTTTTTGC 40111 

Escherichia coli EDla CRISPR 2 spacer 17 

Query 1 CTGAACGTTGAAGAGTGCGA 20 

I I I I I I I I I I I I I I I I I I I I 
Sbjct 56779 CTGAACGTTGAAGAGTGCGA 56798 

Escherichia coli EDla CRISPR 2 spacer 6 

Query 1 TGGCGCGCCTGGAGGACATCCCGGAAGACCAGC 33 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct 55239 TGGCGCGCCTGGAGGACATCCCGGAAGACCAGC 55271 

Escherichia coli EDla CRISPR 2 spacer 7 

Query 1 GGTAACGGGTCAGGCCGACCGTGGAATACCGC 32 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct 25936 GGTAACGGGTCAGGCCGACCGTGGAATACCGC 25905 

Escherichia coli EDla CRISPR 2 spacer 8 

Query 1 ATAAAGCGATCGACGCGGTTCCAGCCATAGAA 32 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct 54407 ATAAAGCGATCGACGCGGTTCCAGCCATAGAA 54376 

Escherichia coli EDla CRISPR 2 spacer 9 

Query 6 CATCACAAACGGGAAGGGTAACGGGCGA 33 

III II I I I I I I I I I I I I I I I I 
Sbjct 57958 CATCACAAACAGGAAGGGTAACGGGCGA 57931 

Escherichia coli EDla CRISPR 3 spacer 8 

Query 1 GTTCCGTATACCAGTGATATGCAGAAGGAACA 32 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct 53454 GTTCCGTATACCAGTGATATGCAGAAGGAACA 53423 



Caldicellulosiruptor bescii DSM 6725 CRISPR 4 spacer 18 

Query 14 TCTTTCTCTCTCTTGCATCATCAC 37 

II I I I I I I II I I I I I I I I I I I I 
Sbjct 31642 TCCTTCTCTCTCTGGCATCATCAC 31619 

Pectobacterium carotovorum carotovorum PCI CRISPR 4 spacer 2 

Query 14 CTCTTCATCTTCCTCATCTTCTTCCTC 40 

I I I I I I I I I I I II I I I I I I I I I I I 

Sbjct 58030 CTCTTCCTCTTCTTCCTCTTCTTCCTC 58056 

Klebsiella variicola At-22 CRISPR 2 spacer 23 

Query 1 TCGACGATGTTCTGCGTGATGGTGATGTATGC 32 

I I I I I I I I I I I I I I I I I I I I I I I I I II II 
Sbjct 7594 7 TCGACGATGTTCTGTGTGATGGTGATATAGGC 75916 

Caldicellulosiruptor lactoaceticus 6A CRISPR 5 spacer 4 

Query 8 AATAAAAACATCCACTATAACATTAA 33 

I I I I I I I I I I I I I III II I I I I I 
Sbjct 67336 AATAAAAACATCCTCTACAAGATTAA 67311 

Iqnavibacterium album JCM 16511 CRISPR 2 spacer 76 

Query 10 AAACATATCTAACATCCTTTTA 31 

I I I I I I I I I I I I I I I I I I I I I 
Sbjct 22477 AAACATATCTAACATACTTTTA 22498 

Escherichia coli UM14 6 CRISPR 3 spacer 3 

Query 1 TGTGGCGCTGATGCGTCTGGGCGTCTTTGTAC 32 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct 37558 TGTGGCGCTGATGCGTCTGGGCGTTTTTGTAC 37527 

Escherichia coli 083 : HI str. NRG 857C CRISPR 2 spacer 9 

Query 1 CTGAACGTTGAAGAGTGCGACCGTCTCTCCTT 32 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Sbjct 56779 CTGAACGTTGAAGAGTGCGACCGTCTCTCCTT 56810 

Klebsiella oxytoca E718 CRISPR 2 spacer 3 

Query 1 GCAGCCTGAACAGCAAGGGTGCGCTCTGAAGA 32 

II I I I I I II II II I I I I I I I I I I I II 

Sbjct 60868 GCTGCCTGGACTGCCAGCGTGCGCTCTGAGGA 60837 



Desulfurococcus kamchatkensis 1221n CRISPR 1 spacer 80 

Query 4 AGAGAAACTAAAAGAGATGAAGCTCGAGAAGGTGATC 4 0 



Sbjct 12488 AGAGAAAGTTAAAAATATGAAATTCGAGAATGTTATC 12452 



Figure 2. Spacers from CRISPRdb targeting plasmid F. (A) Map of plasmid F indicating the size of the plasmid and the location of the origin of transfer ( 
oriT), the relaxase gene, the leading region (which enters the recipient cell first) and the transfer region (which encodes the genes essential for plasmid 
transfer). Asterisks indicate the approximate positions of the protospacers listed in (D). (B) Similar analysis as presented in Figure 1A, where the dis- 
tance of each spacer hit on plasmid F from the oriT or (C) from the relaxase gene is calculated and expressed as a percentage of the total plasmid size. 
(D) Alignments of spacer sequences ("Query," top sequences) and the corresponding plasmid F sequences ("Sbjct", bottom sequences). The species, 
CRISPR and spacer are indicated above each alignment, following the nomenclature used by the CRISPRdb. The numbers adjacent to the alignment 
indicate the position of the spacer and protospacer sequence, respectively. 
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lower level of resistance (15 -fold reduction in conjugation effi- 
ciencies). Interestingly, the level of immunity conferred by the 
CRISPR-F variants appears to be independent of the plasmid 
target region (i.e., leading vs. lagging regions). Since no differ- 
ences in immunity levels were found, the observed protospacer 
distribution bias is unlikely to be due to differences at the level of 
CRISPR-interference. 

During plasmid conjugation, single-stranded plasmid DNA 
is transferred by a Type IV secretion system from the donor to 
the recipient cell, and synthesis of the cDNA strand is generally 
believed to occur after re-circularization of the conjugative plas- 
mid in the recipient cell. 89 At present, it is unknown whether 
Type-I-E CRISPR-Cas systems target single-stranded DNA. To 
test whether differences exist in CRISPR immunity levels that are 
associated with the DNA strand that is being targeted, four addi- 
tional synthetic CRISPRs were constructed that target the syn- 
thesized strand of pOX38-Tc. Of these four CRISPRs, two target 
the leading region and two target the lagging region of pOX38- 
Tc (Fig. 4B). These CRISPRs were named CRISPR-F-SE1 and 
CRISPR-F-SE2 (synthesized early 1 and 2) and CRISPR-F-SL1 
and CRISPR-F-SL2 (synthesized late 1 and 2) (Fig. 4B). 

CRISPR immunity levels appear to be independent of the 
plasmid strand that is being targeted, as determined by mea- 
suring the conjugation efficiencies (transconjugants/donor) of 
pOX38-Tc from MC4100 donor cells to E. coli K\2\hns carrying 
the CRISPR-F variants (Fig. 4D). Again, some variation exists in 
the levels of immunity. Since the AT-content of the spacers and 
the PAM sequences flanking the protospacers are identical in all 
cases, this suggests that additional factors play a role in deter- 
mining the level of interference conferred by a spacer sequence. 
Altogether, it seems that the level of resistance is independent of 
whether the leading or the lagging region is being targeted by a 
CRISPR spacer and whether the incoming or synthesized strand 
is being targeted. These data indicate that single-stranded DNA 
may not be a target for Type I-E CRISPR-Cas systems, and sug- 
gest that the protospacer distribution bias might be due to spacer 
acquisition preferences. 

Discussion 

CRISPR/Cas systems are highly versatile immune systems that 
can block invading DNA to provide immunity against phage 
infections, prevent transfer of conjugative plasmids or trans- 
posons, and can remove resident DNA such as plasmids. The 
spacer content of CRISPRs in natural populations appears to 
be highly dynamic, with frequent spacer acquisition and spacer 
loss. 46,48 " 51,90 " 93 The adaptive properties of CRISPR-Cas make 
them uniquely capable of modulating the mobilome of a spe- 
cies. 46 Indeed, these systems are well suited to remove a given 
DNA element, such as a conjugative plasmid, when such an ele- 
ment is associated with a fitness cost. If environmental conditions 
change such that the DNA element provides a selective advan- 
tage, natural selection will favor clones that have lost or mutated 
the corresponding spacer sequence. 

In this study, we have analyzed the role of CRISPR-Cas in 
providing resistance against conjugative plasmids. Screening for 



spacers matching conjugative plasmid sequences revealed that 
their cognate target sequences are not randomly dispersed along 
the plasmid DNA sequence. Instead, our analysis showed that 
spacers preferentially target the lagging region, while a smaller 
trend for targeting the leading region could also be observed 
(Fig. 1A). The different conjugative plasmids included in this 
analysis were selected on the basis of an annotated oriT she, which 
might not be representative for the whole population of conjuga- 
tive plasmids. Indeed, the data set was enriched for three particu- 
lar classes of conjugative plasmids: MOB p (n = 351, 69%), MOB F 
(n = 42, 8%) and MOB Q (n = 33, 6.5%) (Fig. SI). Interestingly, 
MOB p -class plasmids were found to be targeted in the lagging 
regions, while MOB F -class plasmids are more often targeted in 
the leading regions. The rationale for this is currently unclear. 

Since the total number of MOB f plasmids containing an 
annotated oriT was rather low (n = 42), we conducted a com- 
plementary approach by screening all known MOB F plasmids 
for the presence of a relaxase gene and using its start position 
to calculate relative distances to the target sequences. The ori- 
entation of plasmid transfer was established by the direction of 
transcription of the relaxase gene that is pointing away from the 
oriT. 76 In agreement with the first analysis, this approach demon- 
strated that spacer sequences targeting MOB p family conjugative 
plasmids are significantly enriched for sequences that likely cor- 
respond to the leading region of these plasmids (Fig. IB). The 
results clearly demonstrate that CRISPR-Cas-targeted regions on 
conjugative plasmids are not randomly dispersed, which is rein- 
forced by the finding that protospacers on the same plasmid are 
significantly clustered. 

It should be kept in mind that we did not determine whether 
the protospacers found by BLAST searches would still support 
CRISPR interference, as our analysis lacks the ability to take into 
account biologically important features for the various CRISPR/ 
Cas types such as the presence of a PAM 36 ' 53,94 and seed region of 
the protospacer. 32,43 In addition to the protospacer distribution 
analysis presented here, it would also be interesting to investigate 
whether a bias exists for the strand of the conjugative plasmid 
that is being targeted. Unfortunately, the direction of transcrip- 
tion of most CRISPR arrays listed in the CRISPRdb is unknown, 
hence making it impossible to determine which strand of the con- 
jugative plasmid is being targeted. Nevertheless, the distribution 
of protospacers on conjugative plasmids, even when these pro- 
tospacers no longer support CRISPR interference due to escape 
mutations, provides a valuable insight into host defense strategies 
against conjugative plasmids. 

Using plasmid F as an exemplary case, we have investigated 
the biological basis for the biased protospacer distribution. 
We propose two explanations that are not mutually exclusive. 
The first explanation would be a bias occurring at the level of 
CRISPR adaptation. This hypothesis suggests that sequences 
from the leading region of MOB p plasmids are preferentially 
selected for integration as a novel spacer into a CRISPR array. 
Such an effect could, for example, be caused by interrupted 
mating events (i.e., the process where a mating pair is disrupted 
during conjugational plasmid transfer), which leads to partial 
transfer of the conjugative plasmid DNA. Hence, only DNA 
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entering the recipient cell first (corresponding to the leading 
region) would be transferred and subsequently be available for a 
spacer integration event in the recipient cell. Since the incoming 
DNA is single stranded during plasmid conjugation, this model 
would have implications for the mechanism of spacer acquisition 
(i.e., single-stranded pre-spacers would need to be substrates for 
spacer acquisition)." 5,19,95 We have aimed to examine this hypoth- 
esis by following spacer integration under laboratory conditions 
in response to plasmid F, using both E. coli ¥A2Ahns cells and 
E. coli K\2Acsel cells as recipients. Whereas spacer acquisition 
into the Abns strain would be subject to selection both at the 
level of CRISPR adaptation and at the level of CRISPR inter- 
ference, spacer acquisition into the AcseJ would be subject to 
selection at the level of CRISPR adaptation only (this strain dis- 
plays high level expression of the cas genes downstream of the 
integrated kanamycin resistance cassette 39 ). Hence, differences 
between newly acquired spacers by these two strains with respect 
to the regions that are being targeted would provide insight into 
the biological basis causing the biased targeting of the leading 
region of MOB f plasmids. However, despite numerous attempts, 
spacer acquisition in response to conjugative plasmid transfer 
could not be observed (data not shown). This observation con- 
trasts with previously described spacer acquisition during non- 
mobile plasmid curing by E. coli YA2AhnsP 

An alternative explanation for the biased targeting of the lead- 
ing region of MOB F conjugative plasmids would be that this 
leads to more efficient CRISPR-interference as compared with 
targeting other regions of the conjugative plasmid. However, 
using synthetic CRISPRs targeting either the leading or the lag- 
ging region of plasmid F, we found no evidence for differences in 
resistance levels. In addition, we did not observe differences in 
the effectiveness of CRISPR arrays targeting either the incoming 
strand directly or the synthesized strand, which might suggest 
that CRISPR interference takes place on the double-stranded 
protospacer target only. This is in agreement with the recently 
determined Cas3 cleavage site in the displaced strand of the 
Cascade-induced R-loop. 96 

One factor that could influence both CRISPR-adaptation and 
interference is local DNA topology, 30,97 which can be influenced 
by DNA structuring proteins, such as H-NS. 98 " 101 This could 
cause increased exposure of defined regions of the plasmids to the 
integration and interference machineries. These regions may, for 
example, be linked to transcriptionally active or inactive regions 
and may be conserved within MOB plasmid families. 

In addition to the analyses presented here, it will be interest- 
ing to perform competition experiments between strains carry- 
ing different CRISPR-F variants, in order to obtain insight into 
Darwinian fitness associated with resistance and possible dif- 
ferences between strains due to the plasmid region that is being 
targeted. One possibility that we consider is that MOB F plasmid- 
encoded addiction systems may cause toxic effects when the 
plasmid is targeted in the lagging region, while this toxic effect 
is avoided by targeting of the leading region, since the plasmid 
will be degraded earlier during conjugation. Such effects may not 
influence resistance levels but could affect Darwinian fitness of 
the host. 



This study has demonstrated that the E. coli CRISPR-Cas 
system effectively protects against conjugative plasmid transfer. 
Furthermore, we have shown that a biased protospacer distri- 
bution exists on conjugative plasmids, which is MOB-family- 
dependent. Whereas some MOB families are mainly targeted 
within the leading regions (e.g., MOB F ), others are targeted 
more frequently within the lagging regions (e.g., MOB p ). Future 
research is required to explore the biological factors that cause 
this biased protospacer distribution. The hypotheses that are pro- 
posed in this study aim to provide a framework for this exciting 
field of research. 

Materials and Methods 

Strains, gene cloning, plasmids and vectors. E. coli K12 
BW25113, E. coli KUAhns (from the KEIO collection) and 
E. coli BL21 (DE3) (Novagen) strains were used throughout 
the study as recipient strains. MC4100 carrying pOX38-Tc was 
used as a donor strain. Plasmids pWUR400 and pWUR397, 
which encode the Cascade genes and cas3, have been described 
previously, 22,29 and pWUR692, which encodes CRISPR-F, or 
pWUR691, which encodes CRISPR-J4 were used to express 
the cas genes and CRISPR in BL21(DE3). Synthetic CRISPRs 
were expressed in E. coli YA2Ahns by introducing the follow- 
ing plasmids: pWUR692 (CRISPR-F), pWUR691 (CRISPR 
J4), pWUR693 (CRISPR-F-IE1) pWUR694 (CRISPR-F-IE2) 
pWUR695 (CRISPR-F-IL1) P WUR696 (CRISPR-F-IL2) 
pWUR697 (CRISPR-F-SE1) P WUR698 (CRISPR-F-SE2) 
pWUR699 (CRISPR-F-SL1) pWUR700 (CRISPR-F- SE2). 
CRISPR-encoding plasmids were generated by subcloning syn- 
thetic CRISPRs (GeneArt) in pACYC-duet vectors using the 
Ncol and Kpnl sites. A full description of the plasmids used in 
this study is provided in Table SI. 

Bioinformatics analyses. Spacer sequences [taken from the 
CRISPRdb website (http://crispr.u-psud.fr/crispr/)] and plasmid 
sequences [taken from the GenBank FTP site (ftp://ftp.ncbi.nlm. 
nih.gov/genomes/Plasmids/)] were collected on September 26, 
2012. Conjugative plasmids sequences were extracted from the ini- 
tial plasmid database by screening them for the oriT feature key or 
by screening for annotated relaxase genes. 77 In some cases, identi- 
fication of the relaxase gene was performed by manual inspection 
of the plasmids feature table. BLAST (version.2.27) was used to 
screen for spacers hits in a locally generated BLAST database that 
was constructed using the above-mentioned plasmids. The BLAST 
settings included a word-size of 7, with an expectation value thresh- 
old of 0.1. The cost to open and extend a gap were set at 5 and 2, 
respectively, with a +1 reward for nucleotide matches, and a -1 pen- 
alty for nucleotide mismatches. Results were imported in Excel for 
further processing and to calculate the distances. 

Statistical analysis. For statistical analysis we used the R 
program, version 2.14.2, especially the R-package circular. 102 To 
test for possible clustering of spacer hits within the plasmids, the 
Kolmogorov-Smirnov test was applied, which compares the dis- 
tribution of spacer hits over the plasmids (relative to oriT in one 
analysis, and relative to the relaxase gene for the other analysis, 
as indicated in the text) to the uniform distribution. For this 
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analysis, the spacer position within a plasmid was specified as 
a percentage of plasmid size, with small percentages indicating 
positions close to the reference site (orzT/relaxase) in the direc- 
tion of the leading region of the plasmid. A percentage of 50% 
indicates a position on the plasmid opposite the reference site and 
a percentage close to 100% indicates a position close to the refer- 
ence site, in the lagging region of the plasmid. 

To test for clustering of spacers within individual plasmids, 
we chose a circular approach, so that e.g., spacers on positions 
1% and 99%, far apart on the linear scale, were close together on 
the circular scale. Testing for uniformity of the circular distribu- 
tion of spacers per plasmid was done applying Kuiper's test. To 
test whether the number of plasmids with significant clustering 
was higher than expected by chance, we performed a binomial 
test. 

Conjugation experiments. Conjugation experiments were 
performed by diluting an overnight culture of the donor strain 
MC4100 carrying pOX38-Tc 1:50 into fresh LB supplemented 
with tetracycline (10 (xg/ml) and by diluting an overnight 
recipient strain l:50z into LB supplemented with the appro- 
priate antibiotics. When donor and recipient cells reached 
an OD 600 of 0.6 they were transferred into fresh LB lacking 
antibiotics: 0.25 ml donor cells and 1 ml recipient cells were 
added to 5 ml LB. Conjugation was allowed for 4 h at 37°C 
without shaking. For the experiments where BL21(DE3) served 
as a recipient strain, 1 mM IPTG was added to the medium 
during conjugation. Next, cells were plated on LB containing 
1.5% agar and appropriate antibiotics [i.e., tetracycline (10 \y,gl 
ml) for plating MC4100 + pOX38-Tc donor cells; kanamycin 



(50 (Jig/ml) and chloramphenicol (34 ixg/ml) for plating recipi- 
ent Ahns strains carrying CRISPR-encoding pACYC-duet 
plasmids; kanamycin, streptomycin (50 ug/ml) and chloram- 
phenicol for plating recipient BL21 (DE3) strains carrying 
CRISPR-encoding pACYC-duet plasmids, Cascade-encoding 
pCDFl-b and Cas3-encoding pRSFlb; both tetracycline and 
chloramphenicol to plate transconjugants] . Colony counting 
allows for calculation of conjugation efficiencies (expressed as 
#transconjugants/#donor cells). 
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